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ABSTRACT: A very fruitful approach to the solution of image segmentation 
and surface reconstruction tasks is their formulation as estimation problems 
via the use of Markov random field models and Bayes theory. However, the 
Maximum a Posteriori (MAP) estimate, which is the one most frequently 
used, is suboptimal in these cases. We show that for segmentation problems, 
the optimal Bayesian estimator is the maximizer of the posterior marginals, 
while for reconstruction tasks, the thresholded posterior mean has the 
best possible performance. We present efficient distributed algorithms for 
approximating these estimates in the general case. Based on these results, 
we develop a maximum likelihood procedure that leads to a parameter-free 
distributed algorithm for restoring piecewise constant images. To illustrate 
these ideas, tlie reconstruction of binary patterns is discussed in detail. 
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1. Introduction. 

A very powerful approach for solving computational early vision problems is 
to formulate them as estimation problems. The estimates are based on probabilistic 
models tliat embody both our prior generic knowledge about the behavior of 
the solution, as well as the process by which the observations are obtained. In 
particular, various researchers have used Bayesian estimation and Markov Random 
Field (MRF) models to perform image segmentation (Elliot et. al. (1983); Hansen 
and Elliot (1982); Geman and Geman (1983); Marroquin (1984b); Cohen and 
Cooper (1984)) and surface reconstruction tasks (Marroquin (1984a)). Most of these 
researchers have assumed that the Maximum a Posteriori (MAP) estimate, which 
is defined as the configuration which maximizes the posterior distribution, provides 
the best possible solution to these problems, and several methods (such as dynamic 
programming and "simulated annealing" (Kirkpatrick, et. al. (1983)), a form of 
stochastic relaxation) have been proposed to compute it. 

Let us consider, however, the example portrayed in figure 1. 

Panel (c) represents the MAP estimate of the binary MRF (a) (whose parameters 
aj-e assumed to be known) from the noisy observations (b). It is clear that the 
estimates shown in (d) and (e) (which we will describe later) are better than the MAP 
estimate from almost any viewpoint This suggests the use of criteria other than 
the value of the posterior probability to measure the performance of an estimator. 
In particular, we propose the use of the following criteria for the two classes of 
problems mentioned above: 

1.1. Error Criterion for the Segmentation Problem. 

Consider a field x with A^ elements each of which can belong to one of a finite 
set Qi of classes. Let Zi denote the class to which the i^^ element belongs. The 
segmentation problem is to estimate x from a set of observations {yi, . . ., t/^}. Note 
that %i does not necessarily correspond to the image intensity. It may represent, for 
example, the texture class for a region in the image (as in Elliot et. al., 1983), etc. 

A reasonable criterion for the performance of an estimate x is the number of 
elements that are not classified correctly. Therefore, we define the segmentation 
error e^ as: 
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Figure 1. (a) Sample function of a binary MRF. (b) Output of a binary symmetric channel 
(error rate: 0.4) (c) MAP estimate, (d) Monte Carlo approximation to the MPM estimate, (e) 
Deterministic approximation to the MPM estimate. 

1.2. Error Criterion for the Reconstruction Problem. 

In this case, we also consider a field x with N elements which can take values 
on finite sets {QJ, but now we assume specifically that i,- represents the intensity 
of an image (or the height of a surface) at site i. This suggests that an estimate x 
should be considered "good" if it is close to x in the ordinary sense, so that the 
total squared error: 

N 



will be a reasonable measure for its performance. 

Let us now derive the optimal estimators for these error criteria. 



2. Optimal Bayesian Estimators. 

In this section we introduce two estimators based on the posterior distribution 
P^Ij/.' The Maximizer of the posterior marginals xm}>m, which is obtained by 
maximizing separately the posterior marginal distribution of each element, and the 
Thresholded posterior mean xtpm- Their formal definition is as follows: 

^MFM[i) = q ■ Pi\y(q] y) = sup {Pi\y{r; y)} U) 

where P.-jy is the posterior marginal distribution of tlie i^^' element: 

A|y('?;^)= E Px\y{^\y) (5) 

x:xi=q 

The TPM estimate is: 

^TPM['i)^q ■ iq-Xif= inf {(r - 5,f} (6) 

where x is the posterior mean: 

x = ^xP^^y{x;y) (7) 

X 

We will show that the MPM and TPM estimators are optimal with respect to the 
error criteria (1) and (3) respectively. In what follows, we will assume that the prior 
and conditional distributions P,, Py^^ are known, so that the posterior distribution 
can be obtained from Bayes rule. 

Proposition 1: xmpm is optimal with respect to the segmentation error e,. 

This is not a new result (see for example Abend, 1968); however, we present here 
a simple proof: 

Let X = xmpm^ and let z denote any other estimate for x. We will show that 

EMx,^)]<Ele,{x,z)] 

where the expectation is taken over all x and all possible observations y. 
For any fixed y we have, from (1): 

e,(z I y) = E{e,{x, ^) 1 1/] = ^ E (^ " H^i ~ iiFx|y(x; y) 

X t'=l 
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with P,iy given by (5). Also, 

TV 

M^ \y)== E(^-^t|y(^r;y)) 

1=1 

Therefore, 

N 

e,(i I y) - e,(z \ y) = - Yl [P^\y{i^; y) ~ Pr\y[Zr; v)] (8) 

t = l 

But by the definition (4) of z, each term on the right hand side of (8) is non-positive, 
so that 

es[x I y) - es(z \ y) < for all y 

and therefore, 

E[es{x,x)] < E[e,(x,z)] | 



Note that we have not made any assumption about P^,, P^^i^, or {QJ, so that 
this is a completely general result. In particular, the sets of valid classes Qi do not 
have to be the same for all i , so that tliis result holds if, as in Geman and Geman 
(1983), X is the pair {/,/}, where / denotes the intensity of a piecewise constant 
image, and / the boundaries between uniform regions. 

Proposition 2: xtpm is optimal with respect to the reconstruction error e^. 

Proof: Again, we put a; = xtpm, and let z denote any other estimate. For any fixed 
y we have, from (3): 

M^\y) = J2Ei^i-^i?Px\yi^;y] 
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also, 
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so that 



A' 
er{x I y) ~ er{z \ y) = Y^ [{x, ~ x,f - [x, ~ z,)~] < 

1=] 



from the definition (6) of j i 
Remarks: 

1. In the limit of fine quantization the variables x, become continuous (i.e., Q, =R 
for all i). In this case we recover the classical result that establishes that the posterior 
mean minimizes the mean squared error; for the particular case of Gaussian (Habibi, 
1972; Nahi and Assefi, 1972) or isotropic fields (Levy and TsiLsiklis, 1983), it is then 
possible to construct efficient algorithms tliat compute this quantity. 

2. The surface reconstruction problem treated by Marroquin (1984a) can be 
considered a particular instance of the reconstruction problem defined here. If we 
let X be the pair {/,/}, where / is the (continuous valued) depth process and / is 
a binary line process, the optimal estimator will be xtpm (note that for a binary 
variable, XTPMi'^ = ^mpm{'^))- 

3. These results can also be applied to the case of an m-ary line process (like die 
one suggested by Geman and Geman (1983)) by defining an appropriate mixed 
error: 

e™(/, /, 7, 1) = E(/,- - hf + X £(1 - 6{l, - k)) 

with 6 as defined in equation (2). One can show, using arguments similar to those 
of the proofs above, that the optimal estimate, for any positive value of X will 
t>e {Itpm^mpm}- Note that the posterior mean for / ajid the posterior marginal 
distributions for / must be computed by averaging over all possible values of both 
/ and /: 

f I 



3. Algorithms. 

As in the case of the MAP estimate, the exact computation of the optimal MPM 
and TPM estimates is a formidable task, except for very small N. It is possible, 
however, to define general distributed Monte Carlo procedures tliat will allow us to 
approximate these values as precisely as we want. 

The general idea is to use the Metropolis algorithm (Metropolis, 1953) to 
generate global configurations of the field x which are, asymptoticall.y, distributed 



according to P^ I y. Since the Markov chain defined by the Metropolis algorithm is 
known to be ergodic (Geman and Geman, 1983), we can approximate x by: 



X: 
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E <^ (9) 



and the posterior marginals by: 



^•bW^fcTTi £%!■"- 9) (10) 



t=k 



where a:(*) is the configuration generated by the Metropolis algorithm at time t, and 
k is the time required for the system to be in thermal equilibrium. From these 
values, XMPM and xtpm can be easily computed using (4) and (6). 

If we compare this procedure with the use of "Simulated annealing" (Kirkpatrick, 
et.aL, 1983) for approximating the MAP estimate, we note that even from a 
computational viewpoint it has some distinct advantages: 

1. It is difficult to determine in general the descent rate of the temperature 
(annealing schedule) that will guarantee the convergence of the annealing 
process in a reasonable time (it usually involves a trial and error procedure). 
Since we are running the Metropolis algorithm at a fixed temperature (T = 1), 
this issue becomes irrelevant 

2. Since in our case we are using a Monte Carlo procedure to approximate 
the values of some integrals, we should expect a nice convergence behavior, in 
the sense that coarse approximations can be computed very rapidly, and then 
refined to an arbitrary precision. 

The main disadvantage of this procedure is that in the case of the segmentation 
problem, a large amount of memory might be required if the number of classes 
per element m is large (we need to store the N{m - 1) numbers that define the 
posterior marginals). 

With respect to the relative performance, we point out that for high signal to 
noise ratios, the MAP estimate is usually close to the optimal one (which explains its 
good performance in many cases). If the noise level is high, however, the difference 
in the performances of the two estimators may be dramatic (see fig. 1). 

Finally, from a conceptual viewpoint, the definition of the solution of the 
estimation problem in terms of the posterior mean (or the posterior marginals) 
may lead in some cases to the construction of efficient deterministic algorithms for 
approximating the solution. To make these ideas concrete, we now analyze in detail 
a specific instance of the segmentation problem. 



4. Optimal Segmentation of Binary MRF's. 

In this section we will consider the following particular example of the general 
segmentation problem: 

Let / represent a binary pattern on a rectangular lattice L with N elements, 
which is a particular realization of a Markov Random Field with potentials given 
by: 

r-1, if /, = /,■ and||2-y||G(0,l] 
n/n/y)= 1, if hy^fj and|K-;||G(0,l] 

vO, Otherwise 

where i and ; are sites of L and \\i - j\\ is the distance betweeen i and ;" (i.e., / 
is a realization of a two-dimensional Ising net). Without loss of generality, we will 
assume that Ji G {0, 1} for all i G L. 

The probability density of the configurations F is given by: 

Pr(F = /)=iexp[-lt/(/)] (11) 

where Z is a normalizing constant; the energy U[f) is: 

i,jENN 

where i,j G A^a^ means that i and j are nearest neighbors, and Tq is the natural 
temperature of the system. 

Suppose that / is sent through a binary symmetric channel with error rate e, 
so that the conditional distribution of the observations g is: 



P(,,|/,) = p-^)' ^^< = f; 



,fi,9ie {0,1} 

The posterior distribution is: 



where ^p is a constant, and the posterior energy Up is: 



Pfi,{f;g) = i-e-"^ (12) 



^^ = i E V(fi< fi) + ^ E(l - *(/.■ - S.)) (13) 



with 

a = In 




Figure 2. Ratio of tlic average errors of die M AP and MPM estimators for a 2 x 2 Ising ncL 
4.1. Error Analysis. 



Using equations (1) and (12), it is not difficult to write down the exact expressions 
for the average segmentation error corresponding to tlie MAP and MPM estimators 
for this problem. However, the numerical evaluation of these expressions is feasible 
only for small values of A''. 

In figure 2 we show a plot of the ratio: 









for a 2 X 2 lattice, for different values of tlie error rate e and the natural temperature 
To. As expected, r is never less than 1. In the worst case (for e = 0.1 and To = 2) 
the error of tlie MAP estimate is 1.17 times that of the MPM estimate; if Tb is not 



too small and t is not too large, both estimates coincide, and as t approaches 0.5 
(low signal to noise ratio), the MPM estimate is consistently better than the MAP. 
An experimental analysis of larger lattices reveals a similar qualitative behavior, but 
the values of r are much larger in this case. 

4.2. Example. 

We now return to the example presented in figure 1, and examine it in more 
detail. Panel (a) represents a typical realization of a 64 x 64 Ising net with free 
boundaries, using a value of To = 1.74 (0.75 times the critical temperature of the 
lattice); panel (b), the output of a binary symmetric channel with error rate e = 0.4; 
panel (c) the MAP estimate, and panel (d) an approximation to the MPM estimate 
(which we will label "MPM (M.C.)") obtained using tlie Metropolis algorithm and 
equation (10) to estimate the posterior density. The corresponding values of the 
posterior energy Up (equation (13)) and the relative segmentation error (e,/642) are 
shown on table 1. 





/ 
-5594.8 


Table 1 






Energy 
Seg. Error 


^ fMAP 

-226.0 -6660.9 
0.4 0.33 


fMPMiM.C.) 

-6460.0 
0.128 


A 

-6427.0 
0.124 



4.3. A Fast Algorithm. 

For this problem, it is possible to construct a fast deterministic algorithm whose 
experimental performance (in terms of the average segmentation error) is equivalent 
to the Monte Carlo method discussed above. It is based on the following ideas: 

First, we note that for a binary pattern, the MPM and TPM estimates coincide. 
We will approximate the posterior mean of (12) by that of a Gaussian distribution 
Pg with the property: 

Pcih) = ^e-^^^^) for all /i G {0, 1}. 
/>P 

In particular, we use: 

Pa[h) = ^ exp[-l Y. {hi - hjf - a Uhi - 9i)% 
For this distribution, h is the (unique) minimizer of the convex function: 



which corresponds to the unique fixed point of the system: 



where 

Ni = {jeL : ||z-;|| = l}. 

We could now approximate our estimate by putting: 

/• = eChi) 

where 

^ ^ 10, otherwise ^ ^ 

There is an additional consistency condition that / must satisfy, however. It can 
be shown that when the posterior distribution has the form given by (12) and (13), 
the MPM estimate /, which by definition satisfies: 

also satisfies: 

^z|,(/.-;7)>Pii,((l-/i);/) (16) 

which means that if we replace the observations by the MPM estimate, and compute 
a new MPM estimate for this modified problem, we should get the same result. 
Translating this condition to the case of / , we get that it must satisfy: 

/* = ©(/i*) (17) 

where /i* satisfies: 

*~ m + aTo 

In practice, we get h* as the fixed point of the system: 



\Ni\ + aTo 



(18) 



with 

h'i') = h 



10 



Note that the function: 

acts as a Lyapunov function for the system (18), which is therefore (locally) stable 
(Vidyasagar, 1978). 

This algorithm can be visualized as operating in two steps: In the first one, 
we extract all the information that we need from the observations and encode it in 
h (which is continuous-valued), and in the second one, we find the closest binary 
pattern that satisfies the consistency condition (16). 

To illustrate the performance of this approximation, we show / , for the 
example discussed above, in panel (e) of figure 1, and its corresponding energy and 
segmentation error in the last column of table 1 (labeled "MPM del"). 

4.4. Dimensionality of the Parameter Space. 

Equations (14) and (18), which describe the deterministic approximations to 
Impm depend on the parameters of the system, e and To, only through the product: 

7 = aro = rolog(^-^j (19) 

which means that the behaviour of the algoritlim is completely characterized by the 
single parameter 7. In the case of the Monte Carlo approximation (10), if we fix 
the value of 7, the value of Tq cannot be chosen arbitrarily, since it has to satisfy 
the consistency condition: 

a 
with 



a = log 
1 



e 

N 



^ = T7 E ^* (20) 



^U 



where z is the residual process defined as: 



lo, otherwise 
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(21) 



This means that, given 7, the correct value of To can, in principle, be determined 
in an adaptive way, so that in this case too, the behaviour of the approximation 
depends effectively only on 7. 

5. Simultaneous Estimation of the Field and the Parameters. 

The above considerations suggest that we can estimate simultaneously the 
original field andtht parameters of the system by looking for the supremum of an 
appropriate likelihood function over tlie one-dimensional subspace parametrized 
by 7. We proceed as follows: 

For a given value of 7, we can approximate the corresponding MPM estimate 
/ using tlie methods developed in the previous section, and compute the residual 
process z and the conditional (on 7) Maximum Likelihood Estimate of the error 
rate e using equations (20) and (21). The corresponding conditional estimate for To 
will be: 

to = | (22) 

To measure the "likelihood" of the estimate /, we use the degree of uniformity 
(or "whiteness") of the residual process z. This property can be quantified by the 
variance of the local noise density, which we estimate as follows: 

We cover the lattice with a set {5y} of m non-overlapping squares (say, 8 
pixels wide). For each square Sj, the relative variance of the noise density is: 

^^■ = (^) ' (23) 

with 

where |5y| is the area of the j^^ square. 
The desired likelihood function is: 

m 

Uf)--E^: (24) 

Note that a more conventional likelihood function, such as the conditional 
likelihood proposed by Besag (1972), will not work in this case; this function is 
defined as: 

x(/) = ^i%M) ,ith 

Lk{f)= JlPCfi\fjJeNi,fo) = 
ieCk 
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= n 

ieCk 



exphf E;€^.m,/;)] 



= ii(i+^xp[^ j: vCfi,fj)])-\ ^=1,2 

iec. To jeNi 

where the "codings" Ci and C2 are sets of conditionally independent sites defined 
as: 

Ci = {i : [xi is odd and yi is even) or (xi is even and yi is odd)} 

C2 = {i : (xi is odd and y^is odd) or (x^- is even and y,- is even)} 

with {xi,yi) denoting the row and column indices of site i. In our case, we find 
that as 7 decreases, / becomes more and more uniform, while To remains almost 
constant. It is not difficult to see that as a result, the conditional likelihood L will 
decrease monotonically with 7, which renders it useless for our purpose. 

The range of values [70, 7m] of the parameter 7 that corresponds to the class 
of systems of interest can be determined as follows: 

One can show that for 7 > 8 we will always have fMPMi = 9i for all i, so that 
we can use -fM = 8. The value of 70 can be obtained from an upper bound for e 
and a lower bound for Tq. For example, assuming that 6 < .45 and To > .ST^, we 
get 70 = .23. (Note that when the natural temperature To of a first order, isotropic 
MRF is below 0.5 times T^ (the critical temperature of the lattice; see Kindermann 
and Snell, 1980), the patterns become practically uniform (i.e., /,• =constant for all 
iX while for values of To greater than l.ST,, we get patterns that are practically 
indistinguishable from white noise. Therefore, the assumption To > .STc covers 
practically all the interesting cases). 

The complete estimation procedure is as follows: 
Maximum Likelihood Estimation Algorithm: 

1: Sample the interval [70, 7^] at the points 

70 < 7b •••7n < 7Af 

2: For each 76 e = {71,... 7n} : 
2.1: Find f{l) using (14) and (18). 
2.2: Compute z using (21). 

2.3: Compute e using (20). If e = 0, set iCfil)) = -00 and proceed with the 
next value of 7. Otherwise, compute a and go to 2.4. 

2.4: Compute To using (22). 
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(a) 







(b) 




(c) 



Figure 3. (a) Synthetic image, (b) Noisy observations, (c) Maximum Likelihood Estimate. 

2.5: Compute 1(7(7)) using (23) and (24). 
3: Compute the optimal estimate / using: 



/ = /(7 ) : I(/(7 )) = sup £(/(7)) 



(25) 



The corresponding e*,tQ will be the optimal estimates for e and To, respectively. 
Remarks: 

1. This estimation algorithm allows us to reconstruct a binary pattern / from 
the noisy observations g without having to adjust any free parameters. The only 
prior assumptions correspond to the qualitative structure of the field / (first order, 
isotropic MRF) and to the nature of the noise process, but neither the natural 
temperature Tq nor the error rate e have to be known in advance. In practice, this 
means that we can apply it to restore any binary image with uniform granularity, 
even if it has not been generated by a Markov random process. We have used this 
algorithm to reconstruct a variety of binary images with excellent results. In figure 3 
we show such a restoration. The observations (b) were generated from the synthetic 
image (a) with an actual error rate of .35 (assumed unknown). The MLE for / is 
shown in (c). 

2. This procedure can be easily extended to handle any one-parameter noise 
corruption process (such as zero mean, additive white Gaussian noise). The extension 
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to the case of N-ary fields, i.e., to the restoration of piecewise constant images, is 
also straightforward (using the general algorithm (10) together with equation (4) 
instead of (14) and (18) in step 2.1). 

3. We have found that the likelihood function (24) is reasonably well behaved as 
a function of 7. This permits us to perform the one-dimensional search for its 
supremum in an economical way, by first determining its approximate location using 
a coarse sampling pattern, and then refining its position by a fine sampling of a 
reduced interval. In practice, it is possible to get very good results using on the 
order of 15 samples. 

4. The whole procedure is highly distributed, so that it is possible to implement it 
in parallel hardware in a very efficient way. 

6. Conclusions. 

A very fruitful approach to the solution of image segmentation and surface 
reconstruction tasks is their formulation as estimation problems, via the use of 
MRF models and Bayes theory. However, the MAP estimate, which has been often 
used, will not always give the best solution. We have shown that, for segmentation 
problems, the optimal Bayesian estimator is the maximizer of the posterior marginals 
(MPM), while for reconstruction tasks, the thresholded posterior mean (TPM) has 
the best possible performance. In some cases, tliese estimators will give results 
that are similar to those obtained by the MAP procedure, but in many interesting 
situations (in particular for low signal to noise ratios) their performance will be 
significantly better. 

The exact computation of these estimates (including the MAP) is computationally 
unfeasible in most practical situations. We have presented a general Monte Carlo 
procedure which approximates their value as precisely as desired (at the expense of 
increased computational cost). This procedure can be easily implemented in parallel 
hardware, and it has the advantage that coarse solutions are computed very fast, 
and are then progressively refined. We have also shown how in particular cases, it is 
possible to use the conceptual framework provided by these estimators to construct 
fast algorithms with very good experimental performance. 

Finally, we have developed a maximum likelihood procedure for the simul- 
taneous estimation of a piecewise constant field and the parameters of the system. 
This construction leads to a parameter-free distributed algorithm for reconstructing 
piecewise constant images from noisy observations. 
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